16 research outputs found

    Link homophily in the application layer and its usage in traffic classification

    Get PDF
    Abstract-This paper addresses the following questions. Is there link homophily in the application layer traffic? If so, can it be used to accurately classify traffic in network trace data without relying on payloads or properties at the flow level? Our research shows that the answers to both of these questions are affirmative in real network trace data. Specifically, we define link homophily to be the tendency for flows with common IP hosts to have the same application (P2P, Web, etc.) compared to randomly selected flows. The presence of link homophily in trace data provides us with statistical dependencies between flows that share common IP hosts. We utilize these dependencies to classify application layer traffic without relying on payloads or properties at the flow level. In particular, we introduce a new statistical relational learning algorithm, called Neighboring Link Classifier with Relaxation Labeling (NLC+RL). Our algorithm has no training phase and does not require features to be constructed. All that it needs to start the classification process is traffic information on a small portion of the initial flows, which we refer to as seeds. In all our traces, NLC+RL achieves above 90% accuracy with less than 5% seed size; it is robust to errors in the seeds and various seed-selection biases; and it is able to accurately classify challenging traffic such as P2P with over 90% Precision and Recall

    The state of peer-to-peer network simulators

    Get PDF
    Networking research often relies on simulation in order to test and evaluate new ideas. An important requirement of this process is that results must be reproducible so that other researchers can replicate, validate and extend existing work. We look at the landscape of simulators for research in peer-to-peer (P2P) networks by conducting a survey of a combined total of over 280 papers from before and after 2007 (the year of the last survey in this area), and comment on the large quantity of research using bespoke, closed-source simulators. We propose a set of criteria that P2P simulators should meet, and poll the P2P research community for their agreement. We aim to drive the community towards performing their experiments on simulators that allow for others to validate their results

    Automatic parsing of text-based application protocols using network traffic data

    No full text
    A method for analyzing a protocol of a network, comprising: obtaining a plurality of conversations from the network, wherein each of the plurality of conversations comprises a sequence of messages exchanged between a server and a client of the network using the protocol, wherein each message of the sequence of messages comprise one or more fields separated by a field delimiter of the protocol; extracting, by a computer processor, a plurality of non-alphanumeric tokens from the plurality of conversations, wherein the plurality of non-alphanumeric tokens comprises a non-alphanumeric token associated with a frequency of occurrence in the plurality of conversations; selecting, based on the frequency of occurrence meeting a pre-determined field delimiter candidate selection criterion, the non-alphanumeric token as a field delimiter candidate; dividing, by the computer processor and using the field delimiter candidate, each of the plurality of conversations into a plurality of slices; analyzing, by the computer processor and using a pre-determined field delimiter candidate scoring algorithm, content included in the plurality of slices to: determine a statistical measure of matched slices for each of the plurality of conversations, wherein the statistical measure of matched slices corresponds to an exact-matched-slices percentage and a prefix-matched-slices percentage that are normalized based on an average number of slices per conversation; determine a field delimiter candidate score by aggregating the statistical measure of matched slices for all of the plurality of conversations; and selecting, by the computer processor and based on the field delimiter candidate score associated with the non-alphanumeric token, the non-alphanumeric token as the field delimiter of the protocol

    Towards automatic protocol field inference

    Get PDF
    Security tools have evolved dramatically in the recent years to combat the increasingly complex nature of attacks. However, these tools need to be configured by experts that understand network protocols thoroughly to be effective. In this paper, we present a system called FieldHunter, which automatically extracts fields and infers their types. This information is invaluable for security experts to keep pace with the increasing rate of development of new network applications and their underlying protocols. FieldHunter relies on collecting application messages from multiple sessions. Then, it performs field extraction and inference of their types by taking into consideration statistical correlations between different messages or other associations with meta-data such as message length, client or server IP addresses. We evaluated FieldHunter on real network traffic collected in ISP networks from three different continents. FieldHunter was able to extract security relevant fields and infer their types for well documented network protocols (such as DNS and MSNP) as well as protocols for which the specifications are not publicly available (such as SopCast). Further, we developed a payload-based anomaly detection system for industrial control systems using FieldHunter. The proposed system is able to identify industrial devices behaving oddly, without any previous knowledge of the protocols being used
    corecore